Data Standardization Best Practices
Follow these best practices to standardize attribute values across tenants.
Data standardization is a key focus during the early stages of data discovery and implementation. Standardized data values across tenants streamline Visier's analytic capabilities and improve the accuracy and significance of your outputs. This affects any dimension used in concept configurations, which should be done at the administrating tenant level. For example, when loading data to the Compensation Payout event, the Payout Type dimension is used to configure the Total Cost of Workforce concept.
Comparing and aggregating data is challenging when tenants store it in different formats, units, or values. Standardizing data enables consistent, comparable insights across tenants, allowing a single solution to be deployed without tenant-specific modifications. It is crucial for meaningful comparisons and benchmarking, allowing customers to assess their performance against industry peers. Without standardization, scaling and maintaining deployments become time-consuming and error-prone.
Best practices
Centralize standardized values in the administrating tenant
Use the administrating tenant as the central hub for managing a superset of standardized dimension values. The analytic tenants will seamlessly inherit the configurations from the administrating tenant reducing manual updates and ensuring consistency across deployments.
The following attributes often require standardization:
- Compensation Pay Types such as Base Pay, Variable Pay, and Supplemental Pay
- Exit Reasons such as Retirement and Involuntary/Voluntary
- Start Reasons such as Rehire and Growth
- Performance Rating such as High, Mid, and Low
- Talent Acquisition processes, stages and outcomes, and how these map to the analytic model.
- Locations
- Gender
- Ethnicity
- Manager Status
- Position Status for Position Management data
- Payout Types for Payroll data
- Absence Reasons for Absence data
- Succession Readiness Levels for Succession data
- Working Hours Types such as Overtime, Late, Absence, and Vacation
- Learning Activity Stages for Learning data
- Incident Category for Safety data
Establish the right approach to data standardization
Data standardization should be considered whenever a property or dimension is used in a concept within your solution.
Use the following questions to determine the approach that best meets your needs:
- How many unique values are in this dimension?
- Will the number of unique values increase over time?
- Will customers be allowed to provide or define their own unique values?
- Can customers enter custom values in data fields, or do they select from a predefined set of values you control?
- How easy or difficult is it to get a superset of values across all customers?
- How will you extract the superset of values from your customer base? Can you map to Visier’s concept values?
- What dataset will you use to set up the administrating tenant? Does it already have a superset of data values? If not, how will you include them as part of the dataset?
Data standardization approaches
Data Standardization allows you to take full advantage of Visier’s blueprint inheritance feature. This ensures that key configurations propagate automatically to all analytic tenants, eliminating the need for redundant configurations and enabling easy scalability. For more information, see Blueprint Inheritance.
Within file mapping (recommended)
This approach incorporates the mapping within the data file itself. It streamlines the data standardization process and simplifies data management.
Advantages
- Resource efficiency: Eliminates the need for extra files, conserving processing and storage resources, ultimately reducing load times.
- Streamlined process: Removes the requirement for creating dummy employee records, simplifying the overall data handling process.
- Simplified effective dates: Bypasses the need to generate artificial, incremental effective dates, streamlining data management.
- Efficient troubleshooting: Facilitates easier identification and troubleshooting of data issues.
- Onboarding flexibility: Eliminates the need for a superset of values at the administrating tenant level during concept configuration, providing flexibility in the setup.
Disadvantages
- Additional records requirement: Extra records are needed within the file to accommodate raw values, potentially increasing the number of transformations required in your data pipeline.
Separate mapping file
This approach requires a separate mapping file that defines the relationship between your source data and Visier's standardized values. It offers flexibility and allows for easy updates.
Advantages
- Clear superset visualization: Facilitates easy visualization of a superset of values, making it easier to see all of the possible permutations in one place.
- Simplified data extraction: May offer a straightforward process to extract all values into a single file, reducing the number of transformations that need to happen.
Disadvantages
- Additional business rules: You will have to write new business rules to retrieve the relevant information within the mapping file, which adds complexity to the implementation. For example, a business rule is required to lookup the standardized file from the mapping file.
- Troubleshooting: It can be more challenging to identify where certain values are coming from because the standardized value is not present in the subject source, troubleshooting involves investigating multiple files
- Increased data load overhead: Additional files need to be sent with each load, potentially impacting efficiency.
- Dummy employee record requirement: A dummy employee record must be generated for the assignment of values, introducing an extra step in the mapping process.
- Incremental effective dates challenge: Requires the generation of incremental effective dates to align with Visier’s loading requirements, adding complexity to the implementation process.
- Risk of data load failures: The absence of mapping files may lead to data load failures if fileset validation is enabled.
Concept configuration per tenant
This approach requires giving your implementation team access to Studio for concept configuration tailored to each tenant. This may be the only feasible approach if standardization is impractical and the manual effort is manageable.
Advantages
- Flexible onboarding: Allows for a seamless onboarding process even when standardizing data proves challenging.
- Onboarding flexibility: Eliminates the need for a superset of values at the administrating tenant level during concept configuration, providing flexibility in the setup.
- Reduced dependency on superset values: Eliminates the need for a superset of values at the administrating tenant level during concept configuration.
Disadvantages
- Scale challenges: Delivery and maintenance can pose challenges as the scale of operations increases.
- Training requirements: Requires additional training for implementation teams to effectively use Visier.
- Ongoing commitment: Requires continuous effort for each new customer onboarding process.
Example
This example illustrates why data standardization is important and how you can implement the different approaches.
You're a partner administrator with two analytic tenants (customers) to load data and configure data for. In this example, we'll focus on the Employment Start events and the Start Type dimension that is used to configure the Employee Starts Model in Visier.
Start by extracting a superset of values from all existing customers to ensure new customers’ values align with the standardized set. For example, standardizing New hire, Hired, and Hire under a single category ensures that future customer data integrates seamlessly without repeated configuration.
Customer A’s data has start reasons of New hire and Re-hire
Hire Date |
Employee ID |
Start Reason |
---|---|---|
2024-01-04 |
A1234 |
New hire |
2024-02-12 |
A5678 |
Re-hire |
Customer B's data has start reasons of Hired and Rehired
Hire Date |
Employee ID |
Start Reason |
---|---|---|
2024-01-04 |
B1234 |
Hired |
2024-02-12 |
B5678 |
Rehired |
These start reasons are the same, loading the data as-is would require all four values to be provided as sample data to the administrating tenant. Additionally, the Employee Starts Model in the administrating tenant will need to be configured as follows:
Employee Starts Model - Concept Group |
Start Reason - Dimension Member Values |
---|---|
New Starts |
"New hire", "Hired" |
Returning Employees |
"Re-hire", "Rehired" |
This configuration, published to Production, is inherited by both analytic tenant applications enabling the Employee Starts Model for Customers A and B.
Let's say you have a new customer and they have start reasons of Hire and Rehire. These values do not match the existing Starts Model configuration. The sample data in the administrating tenant must be updated to include these new values, which requires a new project to be created. As you can see, this is no longer a scalable deployment, so you need to standardize the Start Reasons. There are several options for doing this:
Within file mapping (recommended)
This approach does not rely on a separate mapping file, instead, both your customer values and the Visier values are provided in the same file. Embedding mappings within data files streamlines standardization, reduces complexity during the implementation phase, and simplifies ongoing data management.
This is the recommended approach due to the reduction in implementation complexity and ability to manage standardization at scale. You can configure the Employee Starts Model with the VisierExitReason column.
Separate mapping file
This approach requires you to maintain a separate file containing a superset of data values for specified attributes. This file is used to map source data values to Visier's standardized categories. This approach is useful when flexibility is required, such as when handling frequent updates or incorporating custom mappings without modifying the original data files.
The superset file should include columns containing the:
- Standardized value (the Visier value)
- Non-standardized value (the Partner value, or raw value coming from your source system)
In this example, the column VisierExitReason contains the value we expect and the adjacent column is the partner value. This then feeds into the Employee Exit Model concept, that is configured on the Visier value. Any new customer values will automatically map to the correct bucket.
Concept configuration per tenant
This approach requires giving your implementation team access to Studio for concept configuration tailored to each tenant. The process involves identifying key concepts for configuration, with implementation teams working closely with customers to understand their specific requirements for mapping each concept. A common example is performance ratings, where each customer has a different performance scale. For instance, Customer A may use a scale of one to five, and Customer B may use a scale of one to ten. For Customer A, a 4 may be considered a high performer, while customer B considers a 4 to be a low performer.